Parallel modular multiplication using 512-bit advanced vector instructions

نویسندگان

چکیده

Abstract Applications such as public-key cryptography are critically reliant on the speed of modular multiplication for their performance. This paper introduces a new block-based variant Montgomery multiplication, Block Product Scanning (BPS) method, which is particularly efficient using 512-bit advanced vector instructions (AVX-512) modern Intel processor families. Our parallel-multiplication approach also allows squaring and sub-quadratic Karatsuba enhancements. We demonstrate $$1.9\,\times $$ 1.9 × improvement in decryption throughput comparison with OpenSSL $$1.5\,\times 1.5 exponentiation compared to GMP-6.1.2 an Xeon CPU. In addition, we show $$1.4\,\times 1.4 state-of-the-art implementations many-core Knights Landing Phi hardware. Finally, how interleaving Chinese remainder theorem-based RSA calculations within our parallel BPS technique halves latency while providing protection against fault-injection attacks.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Montgomery Multiplication Using Vector Instructions

In this paper we present a parallel approach to compute interleaved Montgomery multiplication. This approach is particularly suitable to be computed on 2-way single instruction, multiple data platforms as can be found on most modern computer architectures in the form of vector instruction set extensions. We have implemented this approach for tablet devices which run the x86 architecture (Intel ...

متن کامل

Multiplication, Division and Shift Instructions in Parallel Random Access Machines

Trahan, J.L., M.C. Loui and V. Ramachandran, Multiplication, division and shift instructions in parallel random access machines, Theoretical Computer Science 100 (1992) l-44. We prove that polynomial time on a parallel random access machine (PRAM) with unit-cost multiplication and division or on a PRAM with unit-cost shifts is equivalent to polynomial space on a Turing machine (PSPACE). This ex...

متن کامل

An Efficient Parallel CMM-CSD Modular Exponentiation Algorithm by Using a New Modified Modular Multiplication Algorithm

This paper presents a new modified Montgomery modular multiplication algorithm based on canonical signed-digit (CSD) representation, and sliding window method. In this modified Montgomery modular multiplication algorithm, signed-digit recoding technique is used in order to increase probability of the zero bits. Also sliding window method is used in order to reduce the multiplication steps consi...

متن کامل

Advanced Bit Manipulation Instructions: Architecture, Implementation and Applications

Advanced bit manipulation operations are not efficiently supported by commodity wordoriented microprocessors. Programming tricks are typically devised to shorten the long sequence of instructions needed to emulate these complicated bit operations. As these bit manipulation operations are relevant to applications that are becoming increasingly important, we propose direct support for them in mic...

متن کامل

BLAKE and 256-bit advanced vector extensions

Intel recently documented its AVX2 instruction set extension that introduces support for 256-bit wide single-instruction multiple-data (SIMD) integer arithmetic over double (32-bit) and quad (64-bit) words. This will enable Intel’s future processors—starting with the Haswell architecture, to be released in 2013—to fully support 4-way SIMD com­ putation of 64-bit ARX algorithms (32-bit is alread...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Cryptographic Engineering

سال: 2021

ISSN: ['2190-8508', '2190-8516']

DOI: https://doi.org/10.1007/s13389-021-00256-9